Learning Evaluation Functions for Large Acyclic Domains
نویسندگان
چکیده
Some of the most successful recent applications of reinforcement learning have used neural networks and the TD( ) algorithm to learn evaluation functions. In this paper, we examine the intuition that TD( ) operates by approximating asynchronous value iteration. We note that on the important subclass of acyclic tasks, value iteration is ine cient compared with another graph algorithm, DAG-SP, which assigns values to states by working strictly backwards from the goal. We then present ROUT, an algorithm analogous to DAG-SP that can be used in large stochastic state spaces requiring function approximation. We close by comparing the behavior of ROUT and TD on a simple example domain and on two domains with much larger state spaces.
منابع مشابه
Peer Assessment in evaluation of Medical sciences students
Introduction: Recently, peer assessment is especially noticed as a progress evaluation method. Although it is a known method, it is a novel method in many countries that they use traditional methods. Then the topic of current review article is peer assessment in medical education. Methods: The documents related to peer assessment, advantages, disadvantages, applications and how use it extracte...
متن کاملLearning Evaluation Functions to Improve Local Search
This paper describes Stage, a learning algorithm that automatically improves search performance on large-scale optimization problems. Stage learns an evaluation function that predicts the outcome of a local search algorithm, such as hillclimbing or Walksat, from features of states visited during search. The learned evaluation function is used to bias future search trajectories toward better opt...
متن کاملTransitive orderings of properties of utility functions
This note considers orderings of properties (or assumptions) on utility functions and specifies domains on which those orderings are transitive or acyclic.
متن کاملAdversarial Data Programming: Using GANs to Relax the Bottleneck of Curated Labeled Data
Paucity of large curated hand-labeled training data for every domain-of-interest forms a major bottleneck in the deployment of machine learning models in computer vision and other fields. Recent work (Data Programming) has shown how distant supervision signals in the form of labeling functions can be used to obtain labels for given data in near-constant time. In this work, we present Adversaria...
متن کاملLearning Evaluation Functions to Improve Optimization by Local Search
This paper describes algorithms that learn to improve search performance on largescale optimization tasks. The main algorithm, Stage, works by learning an evaluation function that predicts the outcome of a local search algorithm, such as hillclimbing or Walksat, from features of states visited during search. The learned evaluation function is then used to bias future search trajectories toward ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996